AITopics | k-mode algorithm

Collaborating Authors

k-mode algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Clustering Categorical Data: Soft Rounding k-modes

Gavva, Surya Teja, S., Karthik C., Punna, Sharath

arXiv.org Artificial IntelligenceOct-7-2023

Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets.

algorithm, k-mode algorithm, softmode, (15 more...)

arXiv.org Artificial Intelligence

2210.0964

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Virginia > Fairfax County > McLean (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Dorman, Karin S., Maitra, Ranjan

arXiv.org Machine LearningJun-6-2020

Mining clusters from datasets is an important endeavor in many applications. The k-means algorithm is a popular and efficient, distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. We provide a novel, computationally efficient implementation of k-modes, called OTQT. We prove that OTQT finds updates, undetectable to existing k-modes algorithms, that improve the objective function. Thus, although slightly slower per iteration owing to its algorithmic complexity, OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum. As a result, we recommend OTQT as the preferred, default algorithm for all k-modes implementations. We also examine five initialization methods and three types of K-selection methods, many of them novel or novel applications to k-modes. By examining performance on real and simulated datasets, we show that simple random initialization is the best initializer and that a novel K-selection method is more accurate than methods adapted from k-means. Identifying groups of similar observations in datasets is common in a wide array of applications, with many clustering methods developed in statistics, machine learning and the applied sciences [1]-[7]. The k-means algorithm [8]-[11] is arguably the most popular method for clustering numerical-valued observations. It scales to large datasets because it does not require calculation of all pairwise distances, and it is distribution-free. While distribution-free does not imply it is assumption-free [12], [13], it is a starting place for users wary of making assumptions about their data. Unfortunately, k-means does not provide an appropriate objective to minimize for datasets with categorical attributes.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2006.03936

Country:

North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Virginia > Fairfax County > McLean (0.04)
(11 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A novel initialisation based on hospital-resident assignment for the k-modes algorithm

Wilde, Henry, Knight, Vincent, Gillard, Jonathan

arXiv.org Machine LearningFeb-7-2020

This paper presents a new way of selecting an initial solution for the k-modes algorithm that allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not. The method, which utilises the Hospital-Resident Assignment Problem to find the set of initial cluster centroids, is compared with the current initialisations on both benchmark datasets and a body of newly generated artificial datasets. Based on this analysis, the proposed method is shown to outperform the other initialisations in the majority of cases, especially when the number of clusters is optimised. In addition, we find that our method outperforms the leading established method specifically for low-density data.

algorithm, dataset, initialisation method, (11 more...)

arXiv.org Machine Learning

2002.02701

Country:

North America > United States > New York (0.04)
Asia (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Providers & Services (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback